Sleep is very important for people. People need sleep to restore energy and stay healthy. It is important for people to follow their circadian clock and get enough sleep, or they will easily get tired and ill. After getting good sleep, a person will not only feel healthy, but also have a positive mood and better concentration when working or studying (UGA Health Center). However, as technology develops, there are more and more distractions. It becomes hard for people to get good sleep every night.
knitr::include_graphics("sleep-deprivation120x960.jpg")
College students are notoriously deprived of sleep with bad sleeping habits. Every time I complete the midnight-due homework and leave library, I see lots of students still struggling with their academic assignments. An article indicates that “An average adult needs between 7.5 and 8 hours of sleep per night.” (Gina Shaw, 2010) However, I think most students around me sleep less than standard. Sleep seems to become a luxury that college students can not afford. Moreover, as students go into college, they have more choices of life and have to balance their study, sleep and other activities. As social media occupies a big part of people’s life, it’s harder for college students to get enough sleep.
A research conducted in Australia demonstrates that “less than 5 h sleep in the 24 h prior to work and/or more than 16 h of wakefulness can significantly increase the likelihood of fatigue-related impairment and error at work.” (Dorrian, Jillian, 2010) For students, such sleep deprivation may make them more stressful and less efficient at work, and thus negatively influence their GPA. In this project, I will explore the relationship between sleep hours, workload, length of time spent on social media as factors, and GPA as potential result from deprivation of sleep for Emory students.
knitr::include_graphics("collegestudents.jpg")
Hypothesis: 1. Emory students get less sleep than US adults average. 2. Students with heavier workload get less sleep. 3. As students spend more time on social media, they will get less sleep. 4. Students with deprived sleep will receive lower GPA. 5. Male and female have same average sleep hours.
The data is collected from 108 students from Econ 220 class Spring2020. They take surveys at the beginning of semester. Effective questions and answers are recorded. From the whole dataset, I pick five variables for my interest—sleep hours per day, credits enrolled this semester, hours spent on social media per day, GPA and gender.
# Data Cleaning for q9: hours of sleep per night
#Create a table to see the values and frequencies of different values of the variable
table(Prodata$q9)
#re-code the value "6 per night" to "6"
Prodata$q9[Prodata$q9=="6 per night"]<-"6"
#round up the numbers to whole hours
Prodata$q9[Prodata$q9=="6.75"]<-"7"
Prodata$q9[Prodata$q9=="7.25"]<-"7"
Prodata$q9[Prodata$q9=="6.5"]<-"7"
Prodata$q9[Prodata$q9=="7.5"]<-"8"
Prodata$q9[Prodata$q9=="8.5"]<-"9"
#verify re-coding
table(Prodata$q9)
summary(Prodata$q9)
#Since the data type of q9_2 variable is character, change it to numeric
Prodata$q9<-as.numeric(Prodata$q9,na.rm=TRUE)
#Change the name of the variable
Prodata$sleep_hrs <- Prodata$q9
#make a summary of the data
summary(Prodata$sleep_hrs)
# Data cleaning for q72: credit hours enrolled
#Create a table to see the values and frequencies of different values of variable q72
table(Prodata$q72)
#Change the data to numeric
Prodata$q72<-as.numeric(Prodata$q72,na.rm=TRUE)
#check the table again
table(Prodata$q72)
#Change the name of the variable
Prodata$credits <- Prodata$q72
#summarise the cleaned variable
summary(Prodata$credits)
# Data cleaning for q86: hours spent on social media
#Create a table to see the values and frequencies of different values of variable q86
table(Prodata$q86)
#Round some decimal to 0.5 or whole numbers for easier manipulation
Prodata$q86[Prodata$q86=="0.4"]<-"0.5"
Prodata$q86[Prodata$q86=="0.875"]<-"1"
#Change the data to numeric
Prodata$q86<-as.numeric(Prodata$q86,na.rm=TRUE)
#check the table again
table(Prodata$q86)
#Change the name of the variable
Prodata$media_hrs <- Prodata$q86
#summarise the cleaned variable
summary(Prodata$media_hrs)
#Data cleaning for sex: gender
#Create a table to see the values and frequencies of different values of variable sex
table(Prodata$sex)
#Change the name of the variable
Prodata$gender <- Prodata$sex
#Data cleaning for q5: GPA
#Create a table to see the values and frequencies of different values of variable GPA
table(Prodata$GPA)
#Change the data to numeric, remove the NA
Prodata$GPA <-as.numeric(Prodata$GPA,na.rm=TRUE)
As we only have 24 hours a day, the length of sleep should be affected by many activities in our life. However, since the data is collected from Emory students in order to explore of sleep habits among college students, I chose credit hours enrolled and hours spent on social media as factor variables because academic workload and social media are two of the biggest factors occupying students’ life, and thus affecting their sleeping length. For the result variable, since college students’ biggest concern is their GPA, I choose that.
I’m not going to divide the data into Year groups. Students taking this class are mostly sophomores and juniors. Freshmen and seniors are unrepresentative. Instead, I’ll add gender as a group variable, but I won’t use it in every analysis.
Beginning analyzing the data, we first take a look at the sleep hour distribution in two tables. The first is mere sleep hours and their frequency. The second adds gender as a group variable. We find that most students sleep between 6-7 hours per night. From the histogram of sleep hours grouped by gender, we find that female tend to sleep more than male students.
#create a table for sleep hours
ts <- table(Prodata$sleep_hrs)
#Make nice-looking table
kable(ts, caption = "Sleep Hours Per Night Distribution",
col.names = c("Sleep hours", "Frequency (n=108)")) %>%
kable_styling(bootstrap_options = c("striped", "bordered"), full_width = F)
| Sleep hours | Frequency (n=108) |
|---|---|
| 4 | 1 |
| 5 | 11 |
| 6 | 28 |
| 7 | 46 |
| 8 | 21 |
| 9 | 1 |
#Create a proportion table of gender
t1 <- table(Prodata$gender,Prodata$sleep_hrs)
#Make nice-looking table
kable(t1, caption = "Table of Sleep Hours by gender") %>%
kable_styling(bootstrap_options = c("striped","bordered"), full_width = F)%>%
add_header_above(c("Gender"=1,"Sleep Hours"=6))
| 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|
| Female | 0 | 5 | 15 | 20 | 7 | 1 |
| Male | 1 | 6 | 13 | 26 | 14 | 0 |
#Create a histogram of sleep length distribution grouped by gender
p8<-Prodata %>%
ggplot(mapping=aes(x=sleep_hrs))+
geom_histogram(fill="aquamarine3",alpha=0.6,bins=6)+
facet_wrap(~gender)+
labs(title="Sleep Length Histogram by Gender") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Sleep Hours Per Day")
p8
The table below shows the summary statistics of sleep hours, along with three variables–credits enrolled, hours spent on social media and GPA. The summary statistics include min, mean, max, and the quartiles.
#Create a dataframe with only four variables
stats <- data.frame(Prodata$sleep_hrs,Prodata$credits,Prodata$media_hrs,Prodata$GPA)
#Omit the NAs
stats1<-na.omit(stats)
#Make a nice-looking table of the summary statistics of the variables
kable(summary(stats1),
caption = "Summary Statistics of Numeric Variables",
col.names = c("Sleep Hours","Credits","Socil Media Hours","GPA")) %>%
kable_styling(bootstrap_options = c("striped", "bordered"),
full_width = F)
| Sleep Hours | Credits | Socil Media Hours | GPA | |
|---|---|---|---|---|
| Min. :4.00 | Min. : 3.00 | Min. : 0.000 | Min. :2.000 | |
| 1st Qu.:6.00 | 1st Qu.:16.00 | 1st Qu.: 1.000 | 1st Qu.:3.380 | |
| Median :7.00 | Median :17.00 | Median : 2.000 | Median :3.700 | |
| Mean :6.72 | Mean :17.27 | Mean : 2.995 | Mean :3.590 | |
| 3rd Qu.:7.00 | 3rd Qu.:19.00 | 3rd Qu.: 4.000 | 3rd Qu.:3.885 | |
| Max. :9.00 | Max. :24.00 | Max. :12.000 | Max. :4.000 |
The histogram of sleep hours below demonstrates a distribution slightly skewed to the left, which indicates less sleep. The mean of students’ sleep hours is 6.72, indicated by the perpendicular dotted line. Such a mean corresponds to a finding that “70%-96% of students get less than 8-hours of sleep per night during the week.” (Shelley Hershner, 2015).
From the graph, we can also find that none of the 108 students sleep more than 9 hours, and more than 10% students sleep five hours or less on average. This shows that Emory students are all diligent and busy. However, such short duration of sleep is bad for their health and life.
#create a histogram for sleep hours with 6 bins and density function
#add a dotted line to indicate the mean
p1<- ggplot(Prodata, aes(sleep_hrs))+
geom_histogram(aes(y =..density..), fill="blue", bins=6, alpha=.4)+
labs(title="Sleep hours Per Night Histogram") +
theme(plot.title = element_text(hjust = 0.5)) +
xlab("Sleep Hours Per Day") +
geom_vline(aes(xintercept = mean(sleep_hrs)), linetype="dotted", color="darkblue", size=1)
#Make the ggplot interactive
ggplotly(p1)
First, we want to see whether sleeping hours has relationship with students’ workload, which is measured by number of credits they enroll in this semester. Group number of credits by workload, demonstrate the means in a table and we find that the means for the groups with light workload, medium workload, heavy workload are 6.5, 6.5 and 6.75, almost the same. It sheds to conclusion that no significant differences exist between sleep hours of the three groups.
Then, from the boxplot we see that all three groups have almost the same median. However, those with medium workload tend to sleep less than those with light workload or heavy workload from the distribution. The distribution for those with light workload and heavy workload are roughly same except that those with heavy workload have a heavier right tail, which indicates longer sleep hours. Therefore, from the boxplot we see no clear pattern of correlation between sleep hours and amount of workload.
#use factor to divide the credits hours into light, medium, heavy workload
Prodata$credits1[Prodata$credits <= 16]="light workload"
Prodata$credits1[Prodata$credits > 16 & Prodata$credits <= 19]="medium workload"
Prodata$credits1[Prodata$credits > 19]="heavy workload"
Prodata$credits1<-ordered(Prodata$credits1,levels=c("heavy workload","medium workload","light workload"))
#Demonstrate mean sleep hours leveled by workload among students in a table
mean_w<-Prodata %>%
na.omit()%>%
group_by(credits1) %>%
summarize(AvgTuition=mean(sleep_hrs)) %>%
kable(digits=3, col.names=c("Workload","Average Sleep Hours"))%>%
kable_styling(bootstrap_options = "striped", full_width = F)
mean_w
| Workload | Average Sleep Hours |
|---|---|
| heavy workload | 6.50 |
| medium workload | 6.50 |
| light workload | 6.75 |
#Use a boxplot to show relationship between sleep hours and students' workload
p4 <- Prodata %>%
ggplot(aes(x=Prodata$credits1,y=sleep_hrs,color=Prodata$credits1))+
geom_boxplot(alpha=0.5)+
labs(title = "Sleep Hours distribution by Workload - Boxplot")+
theme(plot.title = element_text(hjust = 0.5))+coord_flip() +
xlab("Level of workload (measured by credit hours enrolled)") +
ylab("Average Sleep Hours Per Night")
p4
The scatterplot below shows distribution between hours spent hours of sleep per night and GPA with points and lines colored by gender. The graph shows a slightly upward trend, indicating longer sleep correlates with higher GPA. Further test needs to be done to draw more convincing conclusion.
Besides the sleep pattern, from the scatterplot I also find that female students generally have higher GPA than male. Although the smooth line for male is leftward relative to that for female, we can’t conclude that male generally sleep less than female because of the existence of two outliers.
#create a scatter plot between sleep hours and hours spent on social median
ggplot(Prodata, aes(sleep_hrs,GPA,color=gender)) +
geom_point(alpha=I(0.5)) +
geom_smooth(se=F) +
labs(title = "Sleep Hours distribution by GPA - Scatterplot")+
theme(plot.title = element_text(hjust = 0.5))+
ylab("GPA") +
xlab("Average Sleep Hours Per Night")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
At the beginning of data analysis, I mention that a research indicates adult needs an average of 7.5 to 8 hours of sleep per night.(Shaw Gina, 2010) So I compare the sleep hours in our survey with the overall average as 7.75 and 0.05 as the critical value to see whether students at Emory sleep less the average. We use the null hypothesis that sleep hours in US overall and in Emory are the same and the alternative hypothesis that students in Emory sleep less than overall average.
From the one-way t-test, I get t value -11.18 and p-value 2.2e-16 < 0.05. Thus, we have enough evidence to reject the null hypothesis that sleep hours in US overall and in Emory are the same. We have evidence that to demonstrate that Emory students are sleep deprived.
#Use one sample t-test to compare sleep hours with the overall average
s2<-t.test(Prodata$sleep_hrs, mu = 7.75, alternative = "less")
Since there are three groups for the workload factor variable, we use Avova test to compare means between three groups, and 0.05 as critical value. In this case, p-value is 0.217>0.05 and F-value is 1.551. We don’t have enough evidence to reject null hypothesis that means among three groups are the same. We conclude that little variation exists between sleep hours of three groups and students’ sleep time is not significantly correlated with the amount of workload.
# Compute the analysis of variance
res.aov <- aov(sleep_hrs ~ credits1, data = Prodata)
# Summary of the analysis
s1<-summary(res.aov)
We use a correlation test to explore whether correlation exists between sleep hours and students’ GPA. The result shows that the p-value is 0.286 > 0.05, and the confidence interval betwen -0.087 and 0.288, which contains 0. So we don’t have enough evidence to reject the null hypothesis that correlation coefficient is NOT significantly different from 0. In other words, evidence shows that there is little or no correlation between sleep hours and GPA.
#Conduct correlation test between sleep hours and GPA
s4<-cor.test(Prodata$sleep_hrs, Prodata$GPA, method=c("pearson", "kendall", "spearman"))
We use a t-test to check whether sleep hours is affected by gender. As a result, the p-value is 0.59>0.05, and the confidence interval betwen -0.46 and 0.27 contains 0. So we don’t have enough evidence to reject the null hypothesis that sleep hours between male and female are the same. In other words, evidence shows that sleep hours is not significantly affected by gender.
#Conduct t test between sleep hours and gender
s5<-t.test(sleep_hrs ~ gender, data = Prodata)
From the graphs and tests, we find that Emory students don’t have enough sleep compared with7.5 to 8 hours sleep for average US adults. Thus, we can conclude that Emory students are deprived of sleep. It seems that students are willing to sacrifice their rest time in order to engage in other parts of school life.
However, from exploration of the cause and result variables, we find no clear relationship between sleep hours and workload, time spent on social media and GPA. My original hypothesis with those variables are rejected. Maybe students’ sleep time is not significantly affected by their workload and time spent on social media. Instead, sleep length depends more on more internal things like students’ self-discipline, life habits and time managing ability. A student with good habits and high level of self-discipline can smartly plan various things in their life and maintain good balance between sleep, study and social life. So even they have heavy workload and spend much time on social media, they can save time from other things like eating, watching TV and participating in club to have more hours of sleep.
Moreover, sleep time also doesn’t affect students’ GPA a lot. Maybe different students need different length of sleep to stay awake during the day. Or maybe those who are deprived of sleep can find other effective ways to maintain wakefulness and efficiency, like buying a coffee. However, although the result shows no clear positive correlation between sleep and GPA, students should still have enough sleep in order to maintain their health to improve long-run competitiveness.
However, there are also shortcomings of the project. First, the sample is unrepresentative of college students overall since the sample size is relatively small and limited only to our Econ220 class. For example, if we expand the sample outside Emory, maybe the average sleep time will be longer. However, other variables and correlations between variables are hard to predict with more data. Moreover, little insights are shed on potential causes and results of sleep time for students. The findings in this project should be combined with data from other resources for us to draw more conclusive conclusions.
knitr::include_graphics("Peds-Sleep_featured.jpg")
Dorrian, Jillian, et al. “Work Hours, Workload, Sleep and Fatigue in Australian Rail Industry Employees.” Applied Ergonomics, vol. 42, no. 2, 2011, pp. 202–209., doi:10.1016/j.apergo.2010.06.009.
Hershner, Shelley. “Is Sleep a Luxury That College Students Cannot Afford?”Sleep Health, vol. 1, no. 1, 2015, pp. 13–14., doi:10.1016/j.sleh.2014.12.006.
Shaw, Gina. “Adult Sleep Needs at Every Age: From Young Adults to the Elderly.”WebMD, WebMD, 20 Oct. 2010, www.webmd.com/sleep-disorders/features/adult-sleep-needs-and-habits#1.
“Sleep Rocks! …Get More of It!” University Health Center | Managing Stress | Sleep | University Health Center, www.uhs.uga.edu/sleep.